Skip to content

[SharedOffloadRegion] Align blocks to page-size #43689

Merged
orozery merged 2 commits into
vllm-project:mainfrom
neuralmagic:varun/align-test-tensor
Jun 3, 2026
Merged

[SharedOffloadRegion] Align blocks to page-size #43689
orozery merged 2 commits into
vllm-project:mainfrom
neuralmagic:varun/align-test-tensor

Conversation

@varun-sundar-rabindranath
Copy link
Copy Markdown
Contributor

@varun-sundar-rabindranath varun-sundar-rabindranath commented May 26, 2026

Purpose

Align blocks in SharedOffloadRegion to page_size so that O_DIRECT succeeds.

Changes:

  • Update CPUOffloadingSpec to account for alignment
  • Update SharedOffloadRegion to compute aligned row_strides
  • Update test_fs_tier.py to use SharedOffloadRegion for robust testing and testing the interplay between fs_tier and SharedOffloadRegion

Interface change:

  • CPUOffloadingSpec constructor inputs a block_size_alignment integer that defaults to 1
  • SharedOffloadRegion directly consumes the user specified cpu_bytes_to_use.

Test Plan

Run pytest -s tests/v1/kv_offload/test_fs_tier.py multiple times locally.

Test Result

Test passes.

@orozery
Copy link
Copy Markdown
Collaborator

orozery commented May 26, 2026

Thanks @varun-sundar-rabindranath !
This fixes the test, but leaves the possible underlying issue.
Once we fix the underlying issue, this test would actually prove we fixed it.

@varun-sundar-rabindranath
Copy link
Copy Markdown
Contributor Author

Thanks @varun-sundar-rabindranath ! This fixes the test, but leaves the possible underlying issue. Once we fix the underlying issue, this test would actually prove we fixed it.

Thanks for taking a look @orozery. The fixes are the root cause of the test failure. The current set of tests dont invoke the SharedOffloadRegion. These are targeted tests that test the fs_tier directly.

IMO Making sure that the CPU backing tensor is always aligned should be a separate test. what do you think ?

@varun-sundar-rabindranath
Copy link
Copy Markdown
Contributor Author

Thanks @varun-sundar-rabindranath ! This fixes the test, but leaves the possible underlying issue. Once we fix the underlying issue, this test would actually prove we fixed it.

Thanks for taking a look @orozery. The fixes are the root cause of the test failure. The current set of tests dont invoke the SharedOffloadRegion. These are targeted tests that test the fs_tier directly.

IMO Making sure that the CPU backing tensor is always aligned should be a separate test. what do you think ?

Hi @orozery I have updated the tests to use SharedOffloadRegion directly for better tests and have updated SharedOffloadRegion to always align its rows to page size boundaries. PTAL! Thanks.

Copy link
Copy Markdown
Collaborator

@orozery orozery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @varun-sundar-rabindranath !
Can you please change the PR title and description to reflect it aligns CPU pages?


self.page_size = mmap.PAGESIZE
self.num_blocks = num_blocks
self.total_size_bytes, self._row_stride = self._maybe_update_buffer_size(
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This potentially violates the user's cpu_bytes_to_use, allocating more than the user allowed.
I think we want to add an alignment classvar in cpu/spec.py, which will be overrided in tiering/spec.py:

  # CPUOffloadingSpec
  class CPUOffloadingSpec(OffloadingSpec):
      CPU_PAGE_SIZE_ALIGNMENT = 1

      def __init__(self, vllm_config, kv_cache_config):
          ...
          kv_bytes_per_offloaded_block = kv_bytes_per_block * self.block_size_factor
          self.cpu_page_size_per_worker = round_up(
              kv_bytes_per_offloaded_block // world_size,
              self.CPU_PAGE_SIZE_ALIGNMENT,
          )
          self.num_blocks = (
              int(cpu_bytes_to_use) // (self.cpu_page_size_per_worker * world_size)
              if self.cpu_page_size_per_worker > 0
              else 0
          )
          ...

  # TieringOffloadingSpec
  class TieringOffloadingSpec(CPUOffloadingSpec):
      CPU_PAGE_SIZE_ALIGNMENT = SharedOffloadRegion.PAGE_SIZE_ALIGNMENT
      ...

Copy link
Copy Markdown
Contributor Author

@varun-sundar-rabindranath varun-sundar-rabindranath May 27, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch. I agree that cpu_bytes_to_use should be respected.

Concern:
Expanding individual cpu_page_size_per_worker looks like it'll break a invariants and introduce some hard-to-catch bugs. i.e. We are going from,

  B0 |<--- B0 W0 ---><--- B0 W1 ---><--- B0 W2 --->| 
  B1 |<--- B1 W0 ---><--- B1 W1 ---><--- B1 W2 --->| 
  B2 |<--- B2 W0 ---><--- B2 W1 ---><--- B2 W2 --->| 
  ...
 
where Bi - Block i ; Wj - Worker j

to

  B0 |<--- B0 W0 ---***pad***><--- B0 W1 ---***pad***><--- B0 W2 ---***pad***>| 
  B1 |<--- B1 W0 ---***pad***><--- B1 W1 ---***pad***><--- B1 W2 ---***pad***>| 
  B2 |<--- B2 W0 ---***pad***><--- B2 W1 ---***pad****><--- B2 W2 ---***pad***>| 
  ...
 
where Bi - Block i ; Wj - Worker j

One example is the assert in cpu <-> gpu transfer
assert cpu_page_size == gpu_page_size * block_size_factor

assert cpu_page_size == gpu_page_size * block_size_factor

This padding will have to be plumbed through and handled correctly.

Instead I propose doing,

  B0 |<--- B0 W0 ---><--- B0 W1 ---><--- B0 W2 --->***pad***| 
  B1 |<--- B1 W0 ---><--- B1 W1 ---><--- B1 W2 --->***pad***| 
  B2 |<--- B2 W0 ---><--- B2 W1 ---><--- B2 W2 --->***pad***| 
  ...

which is a looser constraint and can be handled directly in SharedOffloadRegion. Respecting cpu_bytes_to_use can be handled by,

  • Allocating less num_blocks in CPUOffloadingSpec when padding is involved (communicated via CPU_PAGE_SIZE_ALIGNMENT classvar or a constructor arg)
  • passing cpu_bytes_to_use directly to SharedOffloadRegion.
  • And introducing padding in SharedOffloadRegion that respects alignment and cpu_bytes_to_use

Comment thread tests/v1/kv_offload/test_fs_tier.py Outdated
Comment on lines +134 to +138
region, tensor, mock_view = _make_region_tensor_and_view(
num_blocks=4,
block_elements=_BLOCK_ELEMENTS,
instance_prefix="test-fs-tier",
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming we move the alignment code further up to spec.py, let's go back to a simple tensor allocation here, but using an aligned page size.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Keeping this as the new set of changes still updates SharedOffloadRegion directly. PTAL. Thanks 🙌

@varun-sundar-rabindranath varun-sundar-rabindranath changed the title Fix test_fs_tier.py [SharedOffloadRegion] Align blocks to page-size May 27, 2026
Comment thread vllm/v1/kv_offload/cpu/spec.py Outdated
Comment on lines +50 to +55
assert self.cpu_bytes_to_use >= aligned_kv_bytes_per_offloaded_block, (
f"CPU space insufficient for offloading. {self.cpu_bytes_to_use=} "
f"{kv_bytes_per_offloaded_block=} "
f"{aligned_kv_bytes_per_offloaded_block=} "
f"{self.block_size_alignment=}"
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this assert?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert is making sure that the num_blocks is not 0. It indicates that the user should increase cpu_bytes_to_use to run with CPU offloading enabled.

Comment thread vllm/v1/kv_offload/cpu/spec.py Outdated
self,
vllm_config: VllmConfig,
kv_cache_config: KVCacheConfig,
block_size_alignment: int = 1,
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's use a classvar instead of introducing a new init param

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to introduce a classvar.
I am a bit uncomfortable with the semantic difference between CPUOffloadingSpec.BLOCK_SIZE_ALIGNMENT vs self.BLOCK_SIZE_ALIGNMENT seems like it increases the surface for bugs. what do you think ?

Comment thread vllm/v1/kv_offload/cpu/spec.py
@mergify
Copy link
Copy Markdown
Contributor

mergify Bot commented May 28, 2026

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @varun-sundar-rabindranath.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

Comment thread vllm/v1/kv_offload/cpu/spec.py Outdated
Comment on lines +55 to +60
assert int(cpu_bytes_to_use) >= self.kv_bytes_per_offloaded_block_pad, (
f"CPU space insufficient for offloading. {cpu_bytes_to_use=} "
f"{self.kv_bytes_per_offloaded_block=} "
f"{self.kv_bytes_per_offloaded_block_pad=} "
f"{self.BLOCK_SIZE_ALIGNMENT=}"
)
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as the previous review: Why do we need this assert?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I replied in the previous review - #43689 (comment)

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO This is a bit far fetched.
And if have a single block, what is it good for?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have reverted the assert.
but, I believe it is better to be defensive in this case - for example I think having zero num_blocks will fail mmap in SharedOffloadRegion because num_blocks is zero. What do you think ? maybe we should handle it elsewhere.

Comment thread vllm/v1/kv_offload/cpu/spec.py Outdated
Comment thread tests/v1/kv_offload/test_fs_tier.py Outdated
@orozery
Copy link
Copy Markdown
Collaborator

orozery commented May 30, 2026

@varun-sundar-rabindranath can you please address the remaining nits?

@varun-sundar-rabindranath
Copy link
Copy Markdown
Contributor Author

@varun-sundar-rabindranath can you please address the remaining nits?

Hi @orozery I have addressed the comments. PTAL! thanks 🙌

Comment thread tests/v1/kv_offload/test_fs_tier.py Outdated
Comment thread tests/v1/kv_offload/test_fs_tier.py Outdated
Comment thread vllm/v1/kv_offload/cpu/shared_offload_region.py
Copy link
Copy Markdown
Collaborator

@orozery orozery left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@orozery orozery added the ready ONLY add when PR is ready to merge/full CI is needed label Jun 1, 2026
@varun-sundar-rabindranath varun-sundar-rabindranath force-pushed the varun/align-test-tensor branch 2 times, most recently from 7773cf1 to 8414e03 Compare June 1, 2026 23:40
@orozery
Copy link
Copy Markdown
Collaborator

orozery commented Jun 2, 2026

varun sundar rabindranath added 2 commits June 3, 2026 04:06
Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com>
Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com>
@orozery orozery merged commit 3d76f39 into vllm-project:main Jun 3, 2026
48 checks passed
mvanhorn pushed a commit to mvanhorn/vllm that referenced this pull request Jun 4, 2026
Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com>
Co-authored-by: varun sundar rabindranath <vsundarr@redhat.com>
Signed-off-by: Matt Van Horn <455140+mvanhorn@users.noreply.github.com>
JisoLya pushed a commit to JisoLya/vllm that referenced this pull request Jun 5, 2026
Signed-off-by: varun sundar rabindranath <vsundarr@redhat.com>
Co-authored-by: varun sundar rabindranath <vsundarr@redhat.com>
Signed-off-by: JisoLya <523420504@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants